The Relationship Between Income and Life Expectancy

Final Project

Author

Hayley C., Felicia P., Ian L., Shane W.

1 Library Imports

2 Data

2.1 Variables

2.2 Daily Mean Income Household Per Capita

GapMinder compiled data on daily mean household income per capita to analyze income distributions over hundreds of years. These figures are anchored in the official Mean Income indicator from the World Bank, derived from household surveys. For countries lacking World Bank data, GapMinder estimated mean income based on GDP per capita. The available time frame for actual World Bank data spans from 1967 to 2021, though most countries have limited data points within these years.

GapMinder used growth rates of constant dollar GDP per capita to estimate mean incomes historically from 1800 and project them up to 2100. For the period 1981-2019, they relied on World Bank data, known for its comprehensive coverage, published in the World Development Indicators as “Survey mean consumption or income per capita, total population (2017 PPP dollars per day).” The indicator we reference is described in the World Bank Poverty and Inequality Platform (PIP) as “Indicators Survey mean/average consumption or income per capita, total population (2017 PPP dollars per day). The mean represents the average monthly household per capita income or consumption expenditure from the survey in 2017 PPP.”

In short, the average daily income is the mean daily household per capita income or consumption expenditure from the survey, expressed in 2017 constant international dollars.

2.2.1 Life Expectancy

GapMinder collects life expectancy data from various sources to create a comprehensive dataset spanning from 1800 to 2100. Life expectancy at birth refers to the average number of years a newborn is expected to live, assuming that current mortality rates remain constant throughout their lifetime.

For the period from 1800 to 1970, GapMinder relies on its own compiled data (version 7), which includes information from over 100 sources and accounts for historical events causing significant mortality dips. From 1950 to 2019, data is primarily sourced from the Global Burden of Disease Study 2019 by the Institute for Health Metrics and Evaluation (IHME). This source provides detailed annual estimates. For projections from 2020 to 2100, GapMinder uses forecasts from the United Nations’ World Population Prospects 2022. The data is carefully combined, prioritizing IHME data when available, and extending IHME series with UN estimates for future projections.

2.3 Hypothesized Relationship Between the Variables

Higher average daily income is positively associated with higher life expectancy at birth.

    country  X1800  X1801  X1802  X1803  X1804  X1805  X1806  X1807  X1808
1 character double double double double double double double double double
   X1809  X1810  X1811  X1812  X1813  X1814  X1815  X1816  X1817  X1818  X1819
1 double double double double double double double double double double double
   X1820  X1821  X1822  X1823  X1824  X1825  X1826  X1827  X1828  X1829  X1830
1 double double double double double double double double double double double
   X1831  X1832  X1833  X1834  X1835  X1836  X1837  X1838  X1839  X1840  X1841
1 double double double double double double double double double double double
   X1842  X1843  X1844  X1845  X1846  X1847  X1848  X1849  X1850  X1851  X1852
1 double double double double double double double double double double double
   X1853  X1854  X1855  X1856  X1857  X1858  X1859  X1860  X1861  X1862  X1863
1 double double double double double double double double double double double
   X1864  X1865  X1866  X1867  X1868  X1869  X1870  X1871  X1872  X1873  X1874
1 double double double double double double double double double double double
   X1875  X1876  X1877  X1878  X1879  X1880  X1881  X1882  X1883  X1884  X1885
1 double double double double double double double double double double double
   X1886  X1887  X1888  X1889  X1890  X1891  X1892  X1893  X1894  X1895  X1896
1 double double double double double double double double double double double
   X1897  X1898  X1899  X1900  X1901  X1902  X1903  X1904  X1905  X1906  X1907
1 double double double double double double double double double double double
   X1908  X1909  X1910  X1911  X1912  X1913  X1914  X1915  X1916  X1917  X1918
1 double double double double double double double double double double double
   X1919  X1920  X1921  X1922  X1923  X1924  X1925  X1926  X1927  X1928  X1929
1 double double double double double double double double double double double
   X1930  X1931  X1932  X1933  X1934  X1935  X1936  X1937  X1938  X1939  X1940
1 double double double double double double double double double double double
   X1941  X1942  X1943  X1944  X1945  X1946  X1947  X1948  X1949  X1950  X1951
1 double double double double double double double double double double double
   X1952  X1953  X1954  X1955  X1956  X1957  X1958  X1959  X1960  X1961  X1962
1 double double double double double double double double double double double
   X1963  X1964  X1965  X1966  X1967  X1968  X1969  X1970  X1971  X1972  X1973
1 double double double double double double double double double double double
   X1974  X1975  X1976  X1977  X1978  X1979  X1980  X1981  X1982  X1983  X1984
1 double double double double double double double double double double double
   X1985  X1986  X1987  X1988  X1989  X1990  X1991  X1992  X1993  X1994  X1995
1 double double double double double double double double double double double
   X1996  X1997  X1998  X1999  X2000  X2001  X2002  X2003  X2004  X2005  X2006
1 double double double double double double double double double double double
   X2007  X2008  X2009  X2010  X2011  X2012  X2013  X2014  X2015  X2016  X2017
1 double double double double double double double double double double double
   X2018  X2019  X2020  X2021  X2022  X2023  X2024  X2025  X2026  X2027  X2028
1 double double double double double double double double double double double
   X2029  X2030  X2031  X2032  X2033  X2034  X2035  X2036  X2037  X2038  X2039
1 double double double double double double double double double double double
   X2040  X2041  X2042  X2043  X2044  X2045  X2046  X2047  X2048  X2049  X2050
1 double double double double double double double double double double double
   X2051  X2052  X2053  X2054  X2055  X2056  X2057  X2058  X2059  X2060  X2061
1 double double double double double double double double double double double
   X2062  X2063  X2064  X2065  X2066  X2067  X2068  X2069  X2070  X2071  X2072
1 double double double double double double double double double double double
   X2073  X2074  X2075  X2076  X2077  X2078  X2079  X2080  X2081  X2082  X2083
1 double double double double double double double double double double double
   X2084  X2085  X2086  X2087  X2088  X2089  X2090  X2091  X2092  X2093  X2094
1 double double double double double double double double double double double
   X2095  X2096  X2097  X2098  X2099  X2100
1 double double double double double double
    country  X1800  X1801  X1802  X1803  X1804  X1805  X1806  X1807  X1808
1 character double double double double double double double double double
   X1809  X1810  X1811  X1812  X1813  X1814  X1815  X1816  X1817  X1818  X1819
1 double double double double double double double double double double double
   X1820  X1821  X1822  X1823  X1824  X1825  X1826  X1827  X1828  X1829  X1830
1 double double double double double double double double double double double
   X1831  X1832  X1833  X1834  X1835  X1836  X1837  X1838  X1839  X1840  X1841
1 double double double double double double double double double double double
   X1842  X1843  X1844  X1845  X1846  X1847  X1848  X1849  X1850  X1851  X1852
1 double double double double double double double double double double double
   X1853  X1854  X1855  X1856  X1857  X1858  X1859  X1860  X1861  X1862  X1863
1 double double double double double double double double double double double
   X1864  X1865  X1866  X1867  X1868  X1869  X1870  X1871  X1872  X1873  X1874
1 double double double double double double double double double double double
   X1875  X1876  X1877  X1878  X1879  X1880  X1881  X1882  X1883  X1884  X1885
1 double double double double double double double double double double double
   X1886  X1887  X1888  X1889  X1890  X1891  X1892  X1893  X1894  X1895  X1896
1 double double double double double double double double double double double
   X1897  X1898  X1899  X1900  X1901  X1902  X1903  X1904  X1905  X1906  X1907
1 double double double double double double double double double double double
   X1908  X1909  X1910  X1911  X1912  X1913  X1914  X1915  X1916  X1917  X1918
1 double double double double double double double double double double double
   X1919  X1920  X1921  X1922  X1923  X1924  X1925  X1926  X1927  X1928  X1929
1 double double double double double double double double double double double
   X1930  X1931  X1932  X1933  X1934  X1935  X1936  X1937  X1938  X1939  X1940
1 double double double double double double double double double double double
   X1941  X1942  X1943  X1944  X1945  X1946  X1947  X1948  X1949  X1950  X1951
1 double double double double double double double double double double double
   X1952  X1953  X1954  X1955  X1956  X1957  X1958  X1959  X1960  X1961  X1962
1 double double double double double double double double double double double
   X1963  X1964  X1965  X1966  X1967  X1968  X1969  X1970  X1971  X1972  X1973
1 double double double double double double double double double double double
   X1974  X1975  X1976  X1977  X1978  X1979  X1980  X1981  X1982  X1983  X1984
1 double double double double double double double double double double double
   X1985  X1986  X1987  X1988  X1989  X1990  X1991  X1992  X1993  X1994  X1995
1 double double double double double double double double double double double
   X1996  X1997  X1998  X1999  X2000  X2001  X2002  X2003  X2004  X2005  X2006
1 double double double double double double double double double double double
   X2007  X2008  X2009  X2010  X2011  X2012  X2013  X2014  X2015  X2016  X2017
1 double double double double double double double double double double double
   X2018  X2019  X2020  X2021  X2022  X2023  X2024  X2025  X2026  X2027  X2028
1 double double double double double double double double double double double
   X2029  X2030  X2031  X2032  X2033  X2034  X2035  X2036  X2037  X2038  X2039
1 double double double double double double double double double double double
   X2040  X2041  X2042  X2043  X2044  X2045  X2046  X2047  X2048  X2049  X2050
1 double double double double double double double double double double double
   X2051  X2052  X2053  X2054  X2055  X2056  X2057  X2058  X2059  X2060  X2061
1 double double double double double double double double double double double
   X2062  X2063  X2064  X2065  X2066  X2067  X2068  X2069  X2070  X2071  X2072
1 double double double double double double double double double double double
   X2073  X2074  X2075  X2076  X2077  X2078  X2079  X2080  X2081  X2082  X2083
1 double double double double double double double double double double double
   X2084  X2085  X2086  X2087  X2088  X2089  X2090  X2091  X2092  X2093  X2094
1 double double double double double double double double double double double
   X2095  X2096  X2097  X2098  X2099  X2100
1 double double double double double double

2.4 How the Data was Cleaned

To the clean the data, we looked at the data types of the values and saw that all the numbers were of the type, character, despite having their class be numeric. To clean this, we mutated each year’s column to be a numeric type.

The year names initially had an X in front of the name when the data was first loaded. We chose to remove this naming convention after pivoting the data so that we can easily reference the years when graphing our data.

Instead of eliminating NA values in average, those values were left so that when joining the data, we can make a decision which years or countries to pick based on data that overlaps between the data frames.

2.5 How the Data was Pivoted

Next, we pivoted the data by country to separate each year into individual observations. For each country and year, we now have the corresponding average daily income and average life expectancy.

2.6 How the Data was Joined

In order to create one data table, we must join our two data sets that were cleaned and pivoted. One way we can do this is through an inner join, which will also handle and missing data by dropping it.

In addition to joining the data, the name of the “country” column was capitalized in order to have uniformity among the variable names.

3 Linear Regression

3.1 Exploring the Relationship Between the Two Variables

The variables to be explored are the average daily income in relation to the average life expectancy. The relationship to be explored is how the income effects the life expectancy.

The explanatory variable is the average income and the response variable is the average life expectancy.

To explore the relationship overtime

3.2 Linear Regression

3.2.1 Steps to Choosing Regression Features

Linear regression was simplified by taking the year 2010. The reason for this is because daily income and life expectancy have shown significant changes over the centuries, making it challenging to capture the full extent of these trends in a single regression model.

Historical data from the 1800s to the present day illustrates substantial shifts in both daily income and life expectancy, reflecting changes in economic, social, and healthcare systems globally.

By selecting the year 2010 as a reference point, we aim to focus on a period that represents a modern snapshot of these trends. Here’s why 2010 is a good choice:

. Representative Modern Era: 2010 serves as a representative point in the modern era, offering insights into contemporary socioeconomic and health conditions across countries.

. Mitigation of Predicted Data: The decision to exclude years beyond 2010 accounts for the absence of actual data and instead focuses on observed trends. This approach prevents potential biases introduced by predicted data, particularly in later years beyond the data collection timeframe.

. Adequate Time for Analysis: With 14 years having passed since 2010, this timeframe provides sufficient data for analysis while minimizing the impact of short-term fluctuations that may occur within smaller time intervals.

By anchoring our analysis to the year 2010, we aim to capture meaningful trends in daily income and life expectancy while ensuring the reliability and relevance of our linear regression model.

3.2.2 Regression Code


Call:
lm(formula = life_expectancy_2010 ~ daily_income_2010, data = average_data_years)

Coefficients:
      (Intercept)  daily_income_2010  
          65.1162             0.2669  

The linear regression formula is \(\hat{y} = 0.2669x + 65.1162\) where \(x\) is the daily income in 2010 and \(y\) is the life expectancy in 2010.

3.2.3 Interpretation of coefficients:

Intercept (65.1162): The intercept term represents the estimated life expectancy in the year 2010 when daily income is zero. However, this interpretation may not be practically meaningful since daily income cannot be zero. It is more relevant to interpret the intercept as the life expectancy when daily income is at its lowest observed value in the data set.

Daily Income Coefficient (0.2669): The coefficient of daily income (0.2669) indicates the estimated change in life expectancy for a one-unit increase in daily income, holding all other variables constant. In this context, it suggests that, on average, for each additional unit increase in daily income, the life expectancy increases by 0.2669 years, given that all other factors remain constant.

These interpretations provide insights into the relationship between daily income and life expectancy in the year 2010, as captured by the estimated regression model.

3.3 Model Fit

Variance Summary of Regression Model
Total Variance Fitted Variance Residual Variance
75.7859 31.69877 44.08713

The proportion of variability in the response values was 75.79% which represents the overall variability present in the response variable.

The fitted variance (31.69%), is a substantial amount of the total variance (75.79%) which indicates that the model can account for a significant amount of variability in the response variable.

However, the residual variance (44.09%) indicates that the majority of variation (58.17%) in the response variable could not be explained. This leads to the conclusion that the quality of the model is closer to poor.